-
Notifications
You must be signed in to change notification settings - Fork 813
feat: Add NETopKV function. #1251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cb2af0c to
1f0778c
Compare
| */ | ||
| void configure(const ITensor *predictions, const ITensor *targets, ITensor *output, const unsigned int k); | ||
|
|
||
| /** Static function to check if given info will lead to a valid configuration of @ref CPPTopKVKernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CpuTopKVKernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next patchset
src/cpu/operators/CpuTopKV.h
Outdated
| void | ||
| configure(const ITensorInfo *predictions, const ITensorInfo *targets, ITensorInfo *output, const unsigned int k); | ||
|
|
||
| /** Static function to check if given info will lead to a valid configuration of @ref CPPTopKVKernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just say "similar to @ref CpuTopKV::configure()" to reduce duplication? Have a look at arm_compute/runtime/experimental/operators/CpuActivation.h
Same comment is valid for every header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in next patchset
| #define ACL_TESTS_VALIDATION_FIXTURES_TOPKVLAYERFIXTURE_H | ||
|
|
||
| #include "arm_compute/core/TensorShape.h" | ||
| #include "arm_compute/core/Types.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need anything particular from this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed them in next patch
89cd2b8 to
cd41038
Compare
|
|
||
| #include "src/core/common/Macros.h" | ||
| #include "src/cpu/ICpuKernel.h" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at more carefully, I think we need to include the STL vector, string and cstdint headers in this file; not inside the corresponding cpp. Also, type_traits for std::add_pointer.
Same goes for ITensorInfo.h, ITensor.h.
We also need to include
- arm_compute/core/Window.h for Window
- arm_compute/core/CPP/CPPTypes.h for CPUInfo
- src/cpu/kernels/CpuKernelSelectionTypes.h for DataTypeSelector.. stuff
- arm_compute/core/Error.h for Status
In a nutshell, this file should be self-sufficient and we don't need to repeat the same includes in the corresponding .cpp file for brevity.
Can you please review this file and CpuTopKVKernek.cpp accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in next patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string and type_traits are not added here.
We should also add ITensorInfo.h, ITensor.h here and remove from CpuTopKVKernel.cpp.
|
|
||
| for (unsigned int n = 0; n < N; ++n) | ||
| { | ||
| const uint32_t target = *reinterpret_cast<const uint32_t *>(targets(Coordinates{static_cast<int>(n)})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
targets[i]??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not work?
541f913 to
e0090dc
Compare
* The Neon(TM) implementation of TopKV reduces execution time from 447.8 ms (CPP) to 11.65 ms for the same workload (F32, C=1000, N=32000, k=3, 6 threads), achieving an approximate 38× speedup. This gain comes from SIMD vectorization, removal of per-element branches, and a more efficient inner loop * Resolves ARMCL-1227. Change-Id: Ifdf161ce4254dc5ecd57aff9ae22410facd31705 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
e0090dc to
11238b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright should be 2026
|
|
||
| #include "src/core/common/Macros.h" | ||
| #include "src/cpu/ICpuKernel.h" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string and type_traits are not added here.
We should also add ITensorInfo.h, ITensor.h here and remove from CpuTopKVKernel.cpp.
| // Load 8 fp16 | ||
| const float16x8_t v16 = vld1q_f16(reinterpret_cast<const __fp16 *>(ptr)); | ||
|
|
||
| // Compare in fp32 (same correctness story you already debugged) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we mean by "you already debugged" here?
|
|
||
| } // namespace detail | ||
|
|
||
| void topkv_fp32_neon(const ITensor *in1, const ITensor *in2, ITensor *out, uint32_t k, const Window &win) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change these to predictions & targets in all classes and functions? (Except the operator list.dox unfortunately due to the convention there). There is a mix-up of in1/src0 in different files.
| * |S32 |U32 |U8 | | ||
| * |F16 |U32 |U8 | | ||
| * |F32 |U32 |U8 | | ||
| * |U8 |U32 |U8 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this U8 line extra here
|
|
||
| #include "tests/framework/datasets/Datasets.h" | ||
|
|
||
| #include <type_traits> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is type_traits here? I think we only need , am I missing anything?
|
|
||
| TEST_SUITE(NEON) | ||
| TEST_SUITE(TopKVLayer) | ||
| // clang-format on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need these two lines.
| TEST_SUITE(TopKVLayer) | ||
|
|
||
| template <typename T> | ||
| using CPPTopKVLayerFixture = TopKVValidationFixture<Tensor, Accessor, CPPTopKV, T>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that it was useful when comparing this implementation but we never have a CPP test suite in the NEON/ directory. The build system might miss to compile this test because we might ignore (we may already be ignoring) any files with under tests/validation/NEON when neon=0 option is provided to Scons.
I think these tests are valuable. Let's keep them but put them under tests/validation/CPP/TopKV.cpp
|
|
||
| for (unsigned int n = 0; n < N; ++n) | ||
| { | ||
| const uint32_t target = *reinterpret_cast<const uint32_t *>(targets(Coordinates{static_cast<int>(n)})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not work?
| /** Initialise the kernel's input, dst and border mode. | ||
| * | ||
| * @param[in] src0 First input tensor info. Data types supported: QASYMM8/QASYMM8_SIGNED/F16/F32/S32 | ||
| * @param[in] src1 Second input tensor info. Data types supported: U32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're missing the argument k here
|
|
||
| // targets must match batch | ||
| // targets is expected to contain N elements (shape [N]) | ||
| ARM_COMPUTE_RETURN_ERROR_ON_MSG(src1.num_dimensions() < 1, "targets must have at least 1 dimension"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only need to copy what's already inside CPPTopKVKernel.cpp:
ARM_COMPUTE_RETURN_ERROR_ON(predictions->num_dimensions() > 2);
ARM_COMPUTE_RETURN_ERROR_ON(targets->num_dimensions() > 1);
The Neon(TM) implementation of TopKV reduces execution time from 447.8 ms (scalar CPP) to 11.65 ms for the same workload (F32, C=1000, N=32000, k=3, 6 threads), achieving an approximate 38× speedup. This gain comes from SIMD vectorization, removal of per-element branches, and a more efficient inner loop.
Resolves ARMCL-1227
Change-Id: Ifdf161ce4254dc5ecd57aff9ae22410facd31705